379 research outputs found

    Efficient comparison based string matching

    Get PDF

    Finding 2-Edge and 2-Vertex Strongly Connected Components in Quadratic Time

    Full text link
    We present faster algorithms for computing the 2-edge and 2-vertex strongly connected components of a directed graph, which are straightforward generalizations of strongly connected components. While in undirected graphs the 2-edge and 2-vertex connected components can be found in linear time, in directed graphs only rather simple O(mn)O(m n)-time algorithms were known. We use a hierarchical sparsification technique to obtain algorithms that run in time O(n2)O(n^2). For 2-edge strongly connected components our algorithm gives the first running time improvement in 20 years. Additionally we present an O(m2/logn)O(m^2 / \log{n})-time algorithm for 2-edge strongly connected components, and thus improve over the O(mn)O(m n) running time also when m=O(n)m = O(n). Our approach extends to k-edge and k-vertex strongly connected components for any constant k with a running time of O(n2log2n)O(n^2 \log^2 n) for edges and O(n3)O(n^3) for vertices

    Detecting One-variable Patterns

    Full text link
    Given a pattern p=s1x1s2x2sr1xr1srp = s_1x_1s_2x_2\cdots s_{r-1}x_{r-1}s_r such that x1,x2,,xr1{x,x}x_1,x_2,\ldots,x_{r-1}\in\{x,\overset{{}_{\leftarrow}}{x}\}, where xx is a variable and x\overset{{}_{\leftarrow}}{x} its reversal, and s1,s2,,srs_1,s_2,\ldots,s_r are strings that contain no variables, we describe an algorithm that constructs in O(rn)O(rn) time a compact representation of all PP instances of pp in an input string of length nn over a polynomially bounded integer alphabet, so that one can report those instances in O(P)O(P) time.Comment: 16 pages (+13 pages of Appendix), 4 figures, accepted to SPIRE 201

    Searching of gapped repeats and subrepetitions in a word

    Full text link
    A gapped repeat is a factor of the form uvuuvu where uu and vv are nonempty words. The period of the gapped repeat is defined as u+v|u|+|v|. The gapped repeat is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its period. The gapped repeat is called α\alpha-gapped if its period is not greater than αv\alpha |v|. A δ\delta-subrepetition is a factor which exponent is less than 2 but is not less than 1+δ1+\delta (the exponent of the factor is the quotient of the length and the minimal period of the factor). The δ\delta-subrepetition is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its minimal period. We reveal a close relation between maximal gapped repeats and maximal subrepetitions. Moreover, we show that in a word of length nn the number of maximal α\alpha-gapped repeats is bounded by O(α2n)O(\alpha^2n) and the number of maximal δ\delta-subrepetitions is bounded by O(n/δ2)O(n/\delta^2). Using the obtained upper bounds, we propose algorithms for finding all maximal α\alpha-gapped repeats and all maximal δ\delta-subrepetitions in a word of length nn. The algorithm for finding all maximal α\alpha-gapped repeats has O(α2n)O(\alpha^2n) time complexity for the case of constant alphabet size and O(nlogn+α2n)O(n\log n + \alpha^2n) time complexity for the general case. For finding all maximal δ\delta-subrepetitions we propose two algorithms. The first algorithm has O(nloglognδ2)O(\frac{n\log\log n}{\delta^2}) time complexity for the case of constant alphabet size and O(nlogn+nloglognδ2)O(n\log n +\frac{n\log\log n}{\delta^2}) time complexity for the general case. The second algorithm has O(nlogn+nδ2log1δ)O(n\log n+\frac{n}{\delta^2}\log \frac{1}{\delta}) expected time complexity

    Pattern Matching in Multiple Streams

    Full text link
    We investigate the problem of deterministic pattern matching in multiple streams. In this model, one symbol arrives at a time and is associated with one of s streaming texts. The task at each time step is to report if there is a new match between a fixed pattern of length m and a newly updated stream. As is usual in the streaming context, the goal is to use as little space as possible while still reporting matches quickly. We give almost matching upper and lower space bounds for three distinct pattern matching problems. For exact matching we show that the problem can be solved in constant time per arriving symbol and O(m+s) words of space. For the k-mismatch and k-difference problems we give O(k) time solutions that require O(m+ks) words of space. In all three cases we also give space lower bounds which show our methods are optimal up to a single logarithmic factor. Finally we set out a number of open problems related to this new model for pattern matching.Comment: 13 pages, 1 figur

    Faster Approximate String Matching for Short Patterns

    Full text link
    We study the classical approximate string matching problem, that is, given strings PP and QQ and an error threshold kk, find all ending positions of substrings of QQ whose edit distance to PP is at most kk. Let PP and QQ have lengths mm and nn, respectively. On a standard unit-cost word RAM with word size wlognw \geq \log n we present an algorithm using time O(nkmin(log2mlogn,log2mlogww)+n) O(nk \cdot \min(\frac{\log^2 m}{\log n},\frac{\log^2 m\log w}{w}) + n) When PP is short, namely, m=2o(logn)m = 2^{o(\sqrt{\log n})} or m=2o(w/logw)m = 2^{o(\sqrt{w/\log w})} this improves the previously best known time bounds for the problem. The result is achieved using a novel implementation of the Landau-Vishkin algorithm based on tabulation and word-level parallelism.Comment: To appear in Theory of Computing System

    An Optimal Algorithm for Tiling the Plane with a Translated Polyomino

    Full text link
    We give a O(n)O(n)-time algorithm for determining whether translations of a polyomino with nn edges can tile the plane. The algorithm is also a O(n)O(n)-time algorithm for enumerating all such tilings that are also regular, and we prove that at most Θ(n)\Theta(n) such tilings exist.Comment: In proceedings of ISAAC 201

    Online Detection of Repetitions with Backtracking

    Full text link
    In this paper we present two algorithms for the following problem: given a string and a rational e>1e > 1, detect in the online fashion the earliest occurrence of a repetition of exponent e\ge e in the string. 1. The first algorithm supports the backtrack operation removing the last letter of the input string. This solution runs in O(nlogm)O(n\log m) time and O(m)O(m) space, where mm is the maximal length of a string generated during the execution of a given sequence of nn read and backtrack operations. 2. The second algorithm works in O(nlogσ)O(n\log\sigma) time and O(n)O(n) space, where nn is the length of the input string and σ\sigma is the number of distinct letters. This algorithm is relatively simple and requires much less memory than the previously known solution with the same working time and space. a string generated during the execution of a given sequence of nn read and backtrack operations.Comment: 12 pages, 5 figures, accepted to CPM 201

    Ternary Syndrome Decoding with Large Weight

    Get PDF
    The Syndrome Decoding problem is at the core of many code-based cryptosystems. In this paper, we study ternary Syndrome Decoding in large weight. This problem has been introduced in the Wave signature scheme but has never been thoroughly studied. We perform an algorithmic study of this problem which results in an update of the Wave parameters. On a more fundamental level, we show that ternary Syndrome Decoding with large weight is a really harder problem than the binary Syndrome Decoding problem, which could have several applications for the design of code-based cryptosystems

    Palindromic Decompositions with Gaps and Errors

    Full text link
    Identifying palindromes in sequences has been an interesting line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Efficient algorithms for the factorization of sequences into palindromes and maximal palindromes have been devised in recent years. We extend these studies by allowing gaps in decompositions and errors in palindromes, and also imposing a lower bound to the length of acceptable palindromes. We first present an algorithm for obtaining a palindromic decomposition of a string of length n with the minimal total gap length in time O(n log n * g) and space O(n g), where g is the number of allowed gaps in the decomposition. We then consider a decomposition of the string in maximal \delta-palindromes (i.e. palindromes with \delta errors under the edit or Hamming distance) and g allowed gaps. We present an algorithm to obtain such a decomposition with the minimal total gap length in time O(n (g + \delta)) and space O(n g).Comment: accepted to CSR 201
    corecore